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1 Introduction 


Krylov subspace methods have become a very useful and popular tool for solving large sets 
of linear and nonlinear equations, and large eigenvalue problems. One of the reason for their 
popularity is their simplicity and their generality. These methods have been increasingly 
accepted as efficient and reliable alternative to the more expensive methods that are usu- 
ally employed for solving dense problems. This trend is likely to accelerate as models are 
becoming more complex and give rise to larger and larger matrix problems. 

It is interesting to observe that, surprisingly, there has been very little done in developing 
algorithms to solve the very large matrix problems that arise in control. Yet, there are now 
applications in this area which lead to systems of equations involving very large sparse 
matrices. This is the case for example in models that involve partial differential equations 
in several space dimensions, or in applications related to large space structures [4]. Another 
typical example is that of electrical networks [5]. 

In [24] we proposed a method for partial pole placement, which consists of placing a few 
of the poles of the matrix, namely only those that are unstable. The methods proposed are 
based on projecting the problem onto a small invariant subspace of A associated with the 
unstable eigenvalues. Datta and Saad [9] considered several ways of solving special Sylvester 
equations and some related problems. Recently, we have considered a few numerical methods 
for solving large Lyapunov equations [25]. 

The purpose of this paper is to describe the general concepts used in Krylov subspace 
methods and to give an overview of the different ways in which they are used. As will be 
seen the method is fairly universal in that it can be used in various forms to provide solution 
methods for virtually any linear problem. However, the success of the method depends 
critically on the nature of the matrices at hand. For example conjugate gradient type 
methods are very successful in solving symmetric positive definite or nonsymmetric positive 
real linear systems but have been rather unsuccessful with highly indefinite problems. 

The next section is a brief introduction to Krylov subspaces. Section 3 discusses the 
application of the method to linear systems, while section 4 is on eigenvalue problems. 
Section 5 will be on evaluating exponentials of A times a vector and some applications to 
Lyapunov equations. 

2 Krylov subspaces 

Given a square matrix A and a nonzero vector v, the subspace defined by 

K m = span {u,i4v,A 2 w,...A m-1 v} (1) 

is referred to as a the m-th Krylov subspace associated with the pair (A,u) and is denoted 
by K m (A,v) or simply by K m if there is no ambiguity. We start by stating a few elementary 
properties of Krylov subspaces. Recall that the minimal polynomial of a vector v is the 
nonzero monic polynomial p of lowest degree such that p{A)v = 0. Clearly, the Krylov 
subspace K m is the subspace of all vectors in which can be written as x = p(A)u, where 
p is a polynomial of degree not exceeding m — 1. 
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Proposition 2.1 The Krylov sub space K m is of dimension m if and only if the degree of 
the minimal polynomial of v with respect to A is not less than m. 


In practice it is uncommon that the degree of the minimal polynomial is less than N , 
even in exact arithmetic. If this were to happen then it is usually helpful rather than harmful 
because of the following proposition. 

Proposition 2.2 Let p be the degree of the minimal polynomial ofv. Then is invariant 
under A and K m = for all m > p. 

Thus, in case p is small we can work work in subspace of dimension p and be able to solve 
the problem exactly in this small subspace. 

Working directly with the basis {A>v}j = o,..., m -i is likely to lead to serious numerical 
difficulties. Most Krylov subspace methods utilize either orthogonal or bi-orthogonal bases 
of K m . Thus, the procedure introduced by Arnoldi [1] builds an orthogonal basis of the 
Krylov subspace K m by the following algorithm. 

Arnoldi’s algorithm: 

1. Start: Choose a vector Vi of norm 1. 

2. Iterate: for j = 1, 2, . . . , m compute, 


hij = ( Avi,Vj ) i = 1,2, ...,j (2) 

i 

w = Avj-Y^hijVi (3) 

h i+lJ = hi, ’ _1 (4) 

v,+i = w/h j+ u (5) 


This algorithm is mathematically equivalent to a Gram-Schmidt process applied to the 
power sequence t>, Av, ....,A m_1 t;, in that it would deliver the same sequence of Vj’s in exact 
arithmetic. The algorithm will stop if the vector w computed in (4) vanishes which happens 
if the degree of the minimal polynomial for vis j. This is referred to a ‘lucky* breakdown since 
as was seen above it means that the original problem (linear system, eigenvalue problem) 
can be solved exactly in a ji-th dimensional subspace. 

The following are a few simple but important properties satisfied by the algorithm. 

Proposition 2.3 The vectors vi,v 2 ,. .. ,v m form an orthonormal basis of the subspace K m = 
span{v i , At>i, . . . , A m_1 i/i}. 

Proposition 2.4 Denote by V m the N x m matrix with column vectors t>i,...,v m and by 
H m the mxm Hessenberg matrix whose nonzero entries are defined by the algorithm. Then 
the following relations hold: 

AV m — V m H m *{* h m+ i (fn Vm+ie m 
VZAV m = H m 
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( 6 ) 

( 7 ) 



Note that when A is symmetric then (7) implies that the matrix H m is tridiagonal 
symmetric and as a result Arnoldi’s algorithm simplifies into an algorithm which involves 
only three consecutive vectors at each step. The corresponding algorithm is the well-known 
Lanczos algorithm. 

The second of the relations in the proposition indicates that the Hessenberg matrix H m 
is nothing but the matrix representation of the projection of A onto K m , with respect to 
the orthogonal basis 14,. Analysis of various projection methods based on Krylov subspaces, 
indicate that, loosely speaking, K m contains the most significant information of A, in that 
the outermost eigenvalues of A are well represented by those of its projection onto K mt 
for large enough m. The main idea of Krylov subspace methods is to project the original 
problem into K m . In the next sections we will see how this is done via simple Galerkin type 
procedures, for standard linear algebra problems. 

The relation (6) has been exploited in [9] for solving special Sylvester’s equations that 
arise in the design of reduced-dimensional state estimator. The Arnoldi and block- Arnoldi 
algorithms have been used in [6] to compute numerically the controllability of a linear system. 

3 Krylov subspace methods for solving linear systems 

Given an initial guess z<j to the linear system 

Ax = b, (8) 

a general projection method seeks an approximate solution x m from an affine subspace x<j +K m 
of dimension m by imposing the Petrov-Galerkin condition 

b - Ax m LL m (9) 

where L m is another subspace of dimension m. A Krylov subspace method is a method for 
which the subspace K m is the Krylov subspace 

K m ( A y r 0 ) = span{r 0 , Ar 0t A J r 0f . . - , A m_1 r 0 }, (10) 

in which r 0 = b—Ax 0 . The different versions of Krylov subspace methods arise from different 
choices of the subspaces K m and L m and from the ways in which the system is preconditioned. 
The most common choices of K m and L m are the following. 

1. L m = K m = K m (A,T 0 ). The conjugate gradient method is a particular instance of 
this method when the matrix is symmetric positive definite. Another method in this class is 
the Full Orthogonalization Method (FOM) [21] which is closely related to Arnoldi’s method 
for solving eigenvalue problems [1]. Also in this class is ORTHORES [14], a method that is 
mathematically equivalent to FOM. Axelsson [2] also derived a similar algorithm for general 
nonsymmetric matrices. 

As an example we outline here the FOM method for solving linear systems. Assume that 
we take v% = t'o/|[t'o ||2 and run m steps of Arnoldi’s method described in the previous section. 
Then, the approximate solution is of the form Xo + V m y m where y m is some m-vector. The 
Galerkin condition (9) with L m = K m gives immediately that y m = R“ 1 ||r 0 || 2 Ci. 
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2. L m = AK m ’,K m = K m (A,r 0 ). With this choice of L m , it can be shown, see e.g., [26] 
that the approximate solution x m minimizes the residual norm ||6 — Ax ||2 over all candidate 
vectors in Xo + K m . In contrast, there is no similar optimality property known for methods 
of the first class when A in nonsymmetric. Because of this, many methods of this type have 
been derived for the nonsymmetric case [3, 14, 11, 27]. The Conjugate Residual method [7] 
is the analogue of conjugate gradient method that is in this class. The GMRES algorithm 
[27] is an extension of the Conjugate Residual method to nonsymmetric problems. 

3. L m = K m (A T ,r 0 ); K m = K m (A,r 0 ). Clearly, in the symmetric case this class of 
methods reduces to the first one. In the nonsymmetric case, the biconjugate gradient method 
(BCG) due to Lanczos [15] and Fletcher [12] is a good representative of this class. There 
are various mathematically equivalent formulations of the biconjugate gradient method [22] , 
some of which are more numerically viable than others. An efficient variation on this method, 
called CGS (Conjugate gradient squared) was proposed by Sonneveld [28]. 

Apart from the above three basic methods there axe a number of techniques for non- 
symmetric problems that are mathematically equivalent to solving the normal equations 
A T Ax — A T b or AA T y = b by the conjugate gradient method. We will comment that these 
methods are often too quickly dismissed as inferior because of the fact that the condition 
number of the original problem is squared. For problems that are strongly indefinite they 
do represent, however, the only viable alternative, since none of the above three types of 
methods would work in this situation. 

One of the possible applications of the methods described here is in the frequency response 
calculations in input-output analysis. For example, in the single input single output case, one 
needs to compute c(A—jtoI)~ 1 b for many values of o>. We observe that the Krylov subspaces 
are invariant under arbitrary shifts to the matrix A, i.e., K m (A,v) = K m (A — sl,v) for any 
s. This suggests using the same Krylov subspace K m (A,v) to get approximations to all of 
the solution vectors {A — ju)I)~ l b via the formula 

*-(") = ( 11 ) 

where we have set /3 = ||fc|| and where we assume that the Amoldi algorithm is started with 
Vi = h/H&Ha, i.e., z 0 = 0. This technique was suggested in [10] and can be regarded as an 
extension of the earlier technique proposed by Laub in [16] for dense problems. Numerical 
experiments are currently being performed. 

An important factor in the success of conjugate gradient-like methods is the precondi- 
tioning technique. This typically consists of replacing the original linear system (8) by, for 
example, the equivalent system 

M~ l Ax = M~ l b (12) 

In the classical case of the incomplete LU preconditionings, the matrix M is of the form 
M = LU where L is a lower triangular matrix and U is an upper triangular matrix such that 
L and U have the same structure as the lower and upper triangular parts of A respectively. In 
the general sparse case, the incomplete factorization is obtained by performing the standard 
LU factorization of A and dropping all fill-in elements that are generated during the process. 
This is referred to as ILU(O), or IC(0) in the symmetric case. 
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4 Krylov subspace methods for eigenvalue problems 

An idea that is basic to sparse eigenvalue calculations is that of projection processes [23]. 
Given a subspace K spanned by a system of m orthonormal vectors V = Jvi, . . . , t/ m ] a 
projection process onto K = span {V} computes an approximate eigenpair A G C,ti € K 
that satisfy the Galerkin condition, 


(A - \l)u ± K (13) 

The approximate eigenvalues A are the eigenvalues of the m x m matrix C = V T AV. 
The corresponding approximate eigenvectors are the vectors «,• = Vxji where are the 
eigenvectors of C. Similarly, the approximate Schur vectors are the vector columns of 
VU, where U = [ui,uj, . . . ,u m ] are the Schur vectors of C , i.e., U T CU is quasi-upper 
triangular. Thus, one possible method for computing eigenvalues/ eigenvectors of large 
sparse matrices is to use the Arnoldi process [1, 20] which is a projection process onto 
Km — span{y\, At/i,...,A m_1 vi}. Once the Arnoldi vectors Vi, ...., v m have been generated 
we can use V m for a projection process onto K m . The matrix V^AV m which is needed for 
this purpose is nothing but the upper Hessenberg matrix H m generated by the algorithm. 

Note that the Arnoldi algorithm utilizes the matrix A only to compute successive matrix 
by vector products w = At/, so sparsity can be exploited. As m increases, the eigenvalues 
°f that are located in the outermost part of the spectrum start converging towards 
corresponding eigenvalues of A. However, the difficulty with the above algorithm is that as 
m increases cost and storage increase rapidly. One solution is to use the method iteratively: 
m is fixed and the initial vector t/j is taken at each new iteration as a linear combination 
of some of the approximate eigenvectors. Moreover, there are several ways of accelerating 
convergence by preprocessing t/i by a Chebyshev iteration before restarting, i.e., by taking 
v i = where z is again a linear combination of eigenvectors. 

A technique related to Arnoldi ’s method is the nonsymmetric Lanczos algorithm [18, 8] 
which produces a nonsymmetric tridiagonal matrix instead of a Hessenberg matrix. Unlike 
Arnoldi’s process, this method requires multiplications by both A and A T at every step. On 
the other hand it has the big advantage of requiring little storage (5 vectors). Although no 
comparisons of the performances of the Lanczos and the Arnoldi type algorithms have been 
made, the Lanczos methods are usually recommended whenever the number of eigenvalues 
to be computed is large. 

Finally, if the matrix is banded an efficient solution is the shift and invert strategy which 
consists of using one of the above iterative methods (subspace iteration, Arnoldi, or Lanczos) 
for the matrix (A— <r/) -1 , where <r is some shift chosen say at the center of some small region 
of the complex plane where eigenvalues are sought. The matrix (A - <t/) _ 1 , need not be 
explicitly computed: all we need is to factor (A — trJ) into LU and subsequently at each 
step of the iterative method solve two triangular systems one with L and the other with U. 
Thus band structure can be fully exploited. In [19] several implementations of the shift and 
invert strategy are considered and the problem of avoiding complex arithmetic when A is 
real is addressed. 
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An important application of the eigenvalue algorithms is to provide a small invariant 
subspace that will represent the critical modes of the system. For example unstable modes, 
assuming that there are just a few of them, can then be displaced by a technique described 
in [24]. The main idea in [24] is now summarized. 

Let A be an N x N real nonsymmetric matrix whose eigenvalues are 


Alj ^2> Afc, -^k+1 > 


Aj\r 


and b be a given real vector, 
modified matrix 


The problem considered is to find a vector / so that the 


M = A-bf 


has the given eigenvalues 

/*1 > /^2 j • « • » Afc-j-i )•••) A/V ■ 

In other words we would like to assign the eigenvalues A t , Aj, . . . , A* of A into pi , fi 2 , . . . , pi, > 
while leaving the rest of the spectrum of A unchanged, with the rank one perturbation —bf T . 
This is referred to as the partial pole assignment problem. 

To solve this problem we assume that we have computed the partial Schur factorization 
for A T : 

A t Q = QR , 

where Q is an N x k matrix whose columns form an orthonormal basis of the left invariant 
subspace associated with A<, i = 1, . . . k and R is a k x k upper quasi-tri angular matrix. We 
then seek a solution / in the form / = Qg. Denoting by s the vector s = Q T b, the matrix 
M T Q is such that, 


M t Q = [A T - fb T ] Q = QR- Qgb T Q = Q[R- gs T ] =Q[R T - sg T ] T 

The above equation means that the choice / = Qg makes the subspace spanned by Q 
also invariant under M T . Moreover, the eigenvalues of the matrix M associated with this 
invariant subspace are the eigenvalues of the kxk matrix Cu = R T — &g T • Thus it suffices to 
assign the eigenvalues of this small matrix to be /*j,i = 1, . . . k , by an appropriate choice of 
the vector g. It was shown in [24] that this solves the problem and that the solution is feasible 
under some mild conditions on b and the /t^’s. Nichols [17] proposed improvements on this 
scheme to provide robust partial pole placement techniques. In [9] alternative techniques for 
partial pole placement were derived from a special technique to solve Sylvester’s equation. 


5 Approximations to e A v and applications 

Computing approximations to the exponential of a matrix is usually not too hard a problem 
for small dense matrices. For large matrices, this can be rather difficult to do because of 
the fact that e A will in general be a dense matrix even when A is very sparse. However, it 
is often not the exponential of the matrix that is sought but its product with some vector 
v. The question of approximating e A v for any given vector v was considered in [13] where 
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polynomial and rational approximations to the exponential were used. Here we summarize 
only the method proposed in [13] that is based on polynomial approximation to e A v: 

e A v &p m -i(A)v (14) 

where p m -i is a polynomial of degree m — 1. Thus, the above approximation is an element 
of the Krylov subspace (1) and it is convenient to express it in the orthonormal basis V m = 
[vi, t> 2 » t> 3 , . . . , v TO ] generated by Arnoldi’s algorithm seen earlier. We can write the desired 
approximation to x = e A v as x m = p m (A)v or equivalently x m = V m y where y is an m- vector. 
In [13], the choice y = with /? = ||v||j was suggested, leading to the following formula 

for arbitrary t, 

e^v « (3V m e Hni ei (15) 

The quality of this approximation was also analyzed in [13] and the following result was 
shown. 

Theorem 5.1 Let A be any square matrix and let p — ||A|| 2 . Then the error of the approx- 
imation (15) is such that 

\\e A v - pV m c H ~e i|| 2 < 2(3^-^-. (16) 

mi 

Experiments reported in [13], reveal that this approximation can be very accurate even for 
moderate values of the degree m. The theorem shows convergence of the approximation (15) 
for fixed t, asm increases to oo. However, note that the above approximation is exact when 
m = N, see [13]. 

One application of the above formulas is that one can approximate e tA v for all t as 

e tA v « pV m e tH ”e x . (17) 

This provides a way of solving ordinary differential equations of the type x = Ax + b which 
was the original problem considered in [13]. Moderate degrees can provide reasonably high 
accuracy in the solutions. Another direct application is described in [25], where the control- 
lability Grammian, 

X = jH e rA bb T e rAT dr. (18) 

was approximated by replacing the function e rA b by its approximation (17). Note that X is 
known to be the solution of the Lyapunov equation 

AX + XA t + bb T = 0. (19) 

A rather unexpected result shown in [25] is that the approximation provided by the above 
integration process is nothing but a Galerkin method applied to (19) over the subspace of 
matrices of the form V m GV£, where V m is fixed and G runs over the set ofmxm matrices. 
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6 Conclusion 


The techniques described in the previous section, and other recent developments elsewhere, 
suggest that in many of the problems in control, one can work in a Krylov subspace of 
reasonably small dimension. For example, although the theory is not established for this 
case, the above approach for Lyapunov equations can be extended to Riccati’s equation using 
a Galerkin point of view. Many of the optimization techniques in control can be carried out 
by replacing the full n- dimensional variable x(t) by an approximation derived from replacing 
* by its m dimensional approximation (17). This may open up interesting new approaches 
and theoretical questions as to the accuracy of the corresponding approximations. 
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